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Q02252 MMSDH HUMAN 
Q02253 MMSDH RAT 
Q07536 MMSDH BOVINE 



MMSDH 



^P42412 MMSDH B. SUBTILIS 

1788042 ALDH B. SUBTILIS 



H_r-913941 BADH BRASSICA NAPUS 
P46562 ALDH (put.) C. ELECANS 

1353248 AL0H4 HUMAN 

-P42236 ALDH (put.) B SUBTILIS ^^ALDH4 
— P39634 ALDH B SUBTIUS 

P33008 ALDH (put.) PSEUDOMONAS SP. 

— 1742508 GABDH E COLI 
P43503 ALDH R PUTIDA 

1790871 ALDH COUAMONAS TESTOSTERONI 

P25553 ALDHA E COLI 



— 556221 ALDH11 HUMAN 

P25526 SSDH E COLI ^ALDHII 
P38067 ALDH (put.) YEAST 

— P23883 ALDH (put.) E COLI 



P19059 ALDH PSEUDOMONAS PUTIDA 
P23105 ALDH PSEUDOMONAS PUTIDA 
•587110 ALDH E COLI 

A42597 ALDH ALCALIGENES EUTROPHUS 



H I— P23240 ALDH WRIO CHOLERAE 
1-576666 ALDH RHODOCOCCUS SP 
1790014 ALDH 8 E COLI 

P37685 ALDHB E COLI 



riT. 



927643 BADH HORDEUM VULGARE 
520546 BADH SORGHUM 6IC0L0R 
520544 BADH SORGHUM BICOLOR 



118492 BADH SPMCIA OLERACEA 
- 1813538 BADH SPINACIA OLERACEA 
17936 BADH BETA VULGARIS 
118490 BADH BETA VULGARIS 
17934 BADH BETA VULGARIS 

— 166484 ALDH ASPERGILLUS NIGER 
-P08157 ALDH EMERICELLA NIDUIANS 
467625 ALDH CLADOSPORIUM HERBERUM 

— 457615 ALDH ALTERNARIA ALTERNATA 

— 1749700 ALDH 5. POMBE 

— P40047 ALDH (put.) YEAST 



PLANT 
BETAINE 
ALDH 



CONTINUED ON 
FIG. 68 



FIG. 6A 



FUNGUS/ 
PIANT/ANIMAL 
ALDH1/2/5/6 



CONTINUED FROM 
FIG. 6A 



•P22281 ALDH1 YEAST 



•P32872 AL0H2 YEAST 

529223 ALOH C. ELEGANS 

r-Pn884 ALDH2 RAT 
J"!- 465254 AL0H2 MOUSE 
P05091 ALDH2 HUMAN 
P20000 ALDH2 BOVINE 
P12762 AL0H2 HORSE 
P30837 AL0H5 HUMAN 



Mr 
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HI 
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PLANT/ANIMAL 
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PI 2693 ALDH PSEUDOMONAS OLEOVORANCE 
-P30840 ALDH ENJOWEBA HISWUnCA 
P43353 ALDH7 HUMAN 



ALDH8 HUMAN 



r— AL0H3 HUMAN 
P11883 AL0H3 RAT 



■C 



ALDHIO HUMAN 
P30839 ALDHIO RAT 



AL0H3/7/8/10 



■AL0H9 HUMAN 



4: 



118491 BAOH E COU 
145405 BADH E COU 
145404 BADH E COU 



•ALDH9 



0.25 



145402 CH0LINE ALDH E COU 
14919 CHOLINE ALDH E COU 



FIG. 6B 
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ALDHl 



-c 



P00352 ALDHl HUMAN 
P24549 ALDHl MOUSE 
527682 ALOH SHEEP 
537498 ALDHl BOVINE 
-P15437 ALDHl HORSE 
■PI 3601 ALDHl RAT 

- P27463 ALDHl CHICKEN 

- 408453 ALDHl SHREW 
544482 AL0H6 HUMAN I ALDH6 
-1743354 ALDH NICOTm TABACUM 

— 529223 ALDH CAEN0RHABDITI5 ELEGANS 

— P30837 ALDH HUMAN | ALDH5 
P20000 ALDH2 BOVINE 
PI 2762 ALDH2 HORSE 
P05091 ALDH2 HUMAN ALDH2 
PI 1884 AL0H2 RAT 
466254 ALDH2 MOUSE 
166484 ALDH ASPERGILLUS NIGER 

P08157 ALDH EMERICELIA NIDULMS 
467625 ALDH ClADOSPORIUM HERBERUM 
467615 ALDH ALTERNARIA ALTERNATA 
P22281 ALDHl YEAST 



■c 



FUNGUS/PLANT 

/ANIMAL 
ALDH 1/2/5/6 



P32872 AL0H2 YEAST 
- P40047 ALDH (put.) YEAST 

1790014 ALDH E COU 

P37685 ALDHB E COU 

576666 ALDH RHODOCOCCUS SP. 
P23240 ALDH VIBRIO CHOLERAE 
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— 520546 BADH SORGHUM BICOLOR 

520544 BAOH SORGHUM BICOLOR 



j-c: 



0.20 



118492 BAOH SPINACIA OLERACEA 
1813538 BADH SPINACIA OLERACEA 
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118490 BADH BEIA VULGARIS 
17934 BADH BETA VULGARIS 



PWNT 
BETAIN 
ALDH 



CONTINUED ON 
FIG. 7B 



FIG. 7A 
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CONTINUED FROM 
FIG. 7A 



I- PI 9059 ALOH PSEUDOMONAS PUTIDA 
'~'-P23105 ALDH PSEUDOMONAS PUWA 



- P23883 ALDH (put.) E COLI 
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P30840 ALDH1 E HISTOLITICA 



r-ALDH3 HUMAN 



r- P43353 ALDH7 HUMAN I ALDH 
^ALDH8 HUMAN I AL0H6 

JIUJAM I 

ALDH3 



P1 1883 ALDH3 RAT 



BACTERIA/ 
PROTOZOAN/ 

ANIMAL 
ALDH3/7/8/10 



ALOHIO 



I— ALDHIO HUMAN 
~^ P30839 ALOHIO RAT 
1790871 m\{ -COmONAS TESrOSTERONI 
— P33008 ALDH (put.) PSEUDOMONAS SP 
1742508 GABOH £ COLI 



P43503 ALOH P PUWA 
P25553 ALOHA E COLI 
556221 SSOH HUMAN 



SSDH 



P25526 SSOH E COLI 

P38067 ALDH (SSDH?) YEAST 

1788042 ALDH E COLI 

1353248 ALDH4 HUMAN 



47, 



•913941 BADH BRASSICA NAPUS 
P49419 ANTIQUITIN HUMAN ^ 
P46562 ALDH (pul.) C. ELEGANS 
P42236 ALDH (put.) R SUBTILIS 
-P42412 MMSDH B. SUBTILIS 

007536 MMSDH BOVINE 

002252 MMSDH HUMAN 

002253 MMSDH RAT 



ALDH4 



■ANTIQUITIN 



MMSDH 



P39634 ALDH fl SUBTILIS 



145402 ALOH E COLI 



0.20 



FIG. 7B 
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START WITH A 
SINGLE BIOLOGICAL 
SYSTEM 



START WITH A 
SINGLE GENE 



START WITH A 
GENE FAMILY 



RECONSTRUCT A "NETWORK" OF INTERACTING GENES AND PROTQNS 



GENE 



gen!] 



GENE 



GENE 



Sim 



GENE 



IDENTIFY A SET OF KEY DOMAINS AND MOTIFS 



I 



SEARCH FOR REIATED MOTIFS IN DATABASES OF KNOWN ORGANISMS 



IDENTIFY MEMBERS OF MULTIGENE FAMIUES 
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FIG. 8 



COMPUTE PHYLOGENETIC TREES 



T 



IDENTIFY CLUSTERS OF PARALOGOUS GENES. IDENTIFY 
PARALOGOUS AND ORTHOLOGOUS NETWORKS 



PARALOGOUS NETWORKS 
IN HUMAN 



MISSING NETWORKS 



GENE 



PARALOGOUS NETWORKS 
IN MICE. ETC. 



^^wP' ^^m^ 



MISSING PARALOG 



MISSING ORTHOLOG 



1 



COMPARE REGULATORY SCHEMES, IDENTIFY GENES THAT ARE KNOWN IN ONE 
BUT MISSING IN ANOTHER SYSTEM. RND THE GENES USING EXPERIMENTAL 

TECHNIQUES. 
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INDUCE 



1 1 
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2«; #3 2%C ^ 

\ ^ ACTIVATE 

• • ' INHIBIT 

4 4 

SPECIES A SPECIES B 
FIG. 9A 

•3 2/ \ 
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4 4 

A OR B A OR B A OR B 

FIG. 9B FIG. 9C FIG. 9D 
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SPECIES A SPECIES B 
FIG. 10 
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GENE A :L^._^.GENE B 

< INDUCE < ACTIVATE I INHIBIT 

FIG. 11A 

GENE A -•-^—^•h-~^.-±-^.-^mGEUZ B 

< INDUCE ACTIVATE I INHIBIT 

FIG. 11B 
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Sim WITH 



BIOLOGIC/\L SYSTEM 



COLLECT PROTEINS/GENES 
RELATED TO THIS SYSTEM 



IDENTIFY SET OF THE KEY 
DOMAINS/MOTIFS 



COMPILE SEQUENCE AUGNMENT 
FOR EACH DOMAIN/MOTIF 



TRAIN ONE OF THE MOST SENSITIVE 
PROTEIN MOTIF SEARCH ALGORITHMS 
TO IDENTIFY THESE MOTIFS IN 
PROTEIN SEQUENCES 




SEARCH FOR RELATED MOTIFS 
IN HUMAN EST DATABASE 



SEARCH FOR RELATED MOTIFS IN 
YEAST AND NEMATODE GENOMES. 

THEN COMPARE- IDENTIRED 
UNANNOTATED GENES 

WITH HUMAN EST DATABASE 



ATTEMPT TO RECOVER THE 

FULL-SIZE GENES BY 
EXPERIMENTAL TECHNIQUES 



FIG. 12 
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H. SAPIENS 

C. ELEGANS_e1 350092 

S. P0MBE_013733 

— S. eERVISIAElS60992 

NIC0TIANA_TABACUM_e244568 



FIG. 14B 



BASE COUNT 405 a 545 c 493 g 278 t 6 others 
ORIGIN 

1 cagccgaagc amgcaaaaat tcttccagga gctgagcaag agcctggacg cattccctga 
61 ggayttctgt cggcacaagg tgctgcccca gctgctgacc gccttcgagt tcggcaatgc 
121 tggggccgtt gtcctcacgc ccctcttcaa ggtgggcaag ttcctgagcg ctgaggagta 
181 tcagcagaag atcatccctg tggtggtcaa gatgttctca tccactgacc gggccatgcg 
241 catccgcctc ctgcagcaga tggagcagtt catccagtac cttgacgagc caacagtcaa 
301 cacccagatc ttcccccacg tcgtacatgg cttcctggac accaaccctg ccatccggga 
361 gcagacggtc aagtccatgc tgctcctggc cccaaagctg aacgaggcca acctcaatgt 
421 ggagctgatg aagcactttg cacggctaca ggccaaggat gaacagggcc ccatccgctg 
481 caacaccaca gtctgcctgg gcaaaatcgg ctcctacctc agtgctagca ccagacacag 
541 ggtccttacc tctgccttca gccgagccac tagggacccg tttgcaccgt cccgggttgc 
601 gggtgtcctg ggctttgctg ccacccacaa cctctactca atgaacgact gtgcccagaa 
661 gatcctgcct gtgctctgcg gtctcactgt agatcctgag aaatccgtgc gagaccaggc 
721 cttcaaggcm wttcggagct tcctgtccaa attggagtct gtgtcggagg acccgaccca 
-781"gctggaggaa~gtggagaagg'atgtccatgc~agcctccagc"cctggcatgg~^ 
841 agctagctgg gcaggctggg cgtgaccggg gtctcctcac tcacctccaa gctgatccgt 
901 tcgcacccaa ccactgcccc aacagaaacc aacattcccc aaagacccac gcctgaagga 
961 gttcctgccc cagcccccac ccctgttcct gccaccccta caacctcagg ccactgggag 
1021 acgcaggagg aggacaagga cacagcagag gacagcagca ctgctgacag atgggacgac 
1081 gaagactggg gcagcctgga gcaggaggcc gagtctgtgc tggcccagca ggacgactgg 
1141 agcaccgggg gccaagtgag ccgtgctagt caggtcagca actccgacca caaatcctcc 
1201 aaatccccag agtccgactg gagcagctgg gaarctgagg gctcctggga acagggctgg 
1261 caggagccaa gctcccagga gccacctyct gacggtacac ggctggccag cgagtataac 
1321 tggggtggcc cagagtccag cgacaagggc gaccccttcg ctaccctgtc tgcacgtccc 
1381 agcacccagc cgaggccaga ctcttggggt gaggacaact gggagggcct cgagactgac 
1441 agtcgacagg tcaaggctga gctggcccgg aagaagcgcg aggagcggcg gcgggagatg 
1501 gaggccaaac gcgccgagag gaaggtgcca agggccccat gaagctggga gcccggaagc 
1561 tggactgaac cgtggcggtg gcccttcccg gctgcggaga gcccgcccca cagatgtatt 
1621 tattgtacaa accatgtgag cccggccgcc cagccaggcc atctcacgtg tacataatca 
1681 gagccacaat aaattctatt tcacaaaaaa aaaaaaaaaa aaaaaaa 
// 
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FIG. 14D 
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>sp|P15533|RPTl_M0USE DOWN REGULATORY PROTEIN OF INTERLEUKIN 2 RECEPTOR 
(J03776) rpt-lr [Mus musculus] Length = 353 



Score = 92.0 bits (237), Expect = 6e-20 



Query 194 VMELLEEDLTCPICCSLFDDPRVLPCSHNFCKKCLEGILEGSVRNSMWRPAPFKCPTCRK 373 

V+E+++E++TCPIC L +P C+H+FC+ C+ E S RN+ CP CR 

Sbjct 5 VLEMIKEEVTCPICLELLKEPVS/\DCNHSFCRACITLNYE-SNRNT---DGKGNCPVCRV 60 



Query 374 ETSATGINSLQVNYSLKGIVEKYNKIKISP ■ - - -KMPVCKGHM6QPLNIFCLTDMQLICG 541 
+L+ N + IVE+ K P K+ -K: H G+ L +FC DM +IC 
Sbjct-61- PYP-'FGNLRPNUHVANI-VERLKGFKSIPEEEQKVNIGAQH-GEKLRLFGRKDMMV-IGW-116 

Query 542 ICATRGEHTKHVFCSIEDAYAQERDAFESLFQSF ETWRRGDALSRLDTMETSK 700 

+C EH H IE+ + ++ + + W+ L R+D 
Sbjct 117 LCERSQEHRGHQTALIEEVDQEYKEKLQGALWKLMKKAKICDEWQDDLQLQRVDW 171 

Query 701 RKSLQLMTKDSDKVKEFFEKLQHTLDQKKNEILSDFETMKLAVMQAYDPEINKL 862 

-K}+ + + V+ F+ L+ LD K+NE L + K VM+ + N+L 
Sbjct 172 ENQIQI - - - NVENVQRQFKGLRDLLDSKENEELQKLKKEKKEVMEKLEESENEL 222 

Homology covers ring finger, B-box and the beginning of coiled coil domain 
in the CLL ring finger protein 



FIG. 15 




ACTIVATED CD4"^ T-CELLS 

Rpti (REPRESSES EXPRESSION OF IL-2 RECEPTOR) 

1 
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TBLASTN 2.0.8 [Jan -05 -1999] 



Reference : 

Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer. 
Jinghui Zhang, Zheng Zhang, Webb Mnier, and David L. Lipman (1997), 
"Gapped BLAST and PSI -BLAST: a new generation of protein database search 
programs", Nucleic Acids Res. 25.3389-3402. 

Query= gi | 2137498 |Mad3m 
(205 letters) 

gb|AA278224|AA2278224 zs77e05.rl NCI_CGAP_GCB1 Homo sapiens cDNA clone IMAGE: 703520 5* 
similar to TR:G1184157 G1184157 MAX- INTERACTING 
TRANSCRIPTIONAL REPRESSOR. ; 
Length = 430 

Score = 209 bits (526). Expect = le-53 

Identities = 104/124 (83*), Positives = 116/124 (92*), Gaps = 1/124 (0*) 
Frame = +2 

Query: 1 MEPVASNIQVLLQAAEFLERREREAEHGyASLCPHHSPGTVCRRRKPPLQAPGALNSGRS 60 ID14 

MEP +ASN IQ VLLQAAEFLERREREAEHGYASLCPH SPG + RR+K P_ QAPGA +SGRS _ ID_15 
Sbjcf: 5& " HEFLASNIQVTLQAAEFLERREREAEHGYASirPHRSPGPIHRR^^^^ "235 1016 

Query: 61 VHNELEKRRRAQLKRCLEQLRQQMPLGVDCTRYTTLSLL-RARVHIQKLEEQEQQARRLK 119 

VHNELEKRRRAQLKRCLE■^L+(3QMPLG DC RYTTLSLL RAR+HIQKLE-K3EQ+AR-m< 
Sbjct: 236 VHNELEKRRRAQLKRCLERLKQQMPLGGDCARYTTLSLLRRARMHIQKLEDQEQRARQLK 415 

Query: 120 EKLRS 124 
E+LR+ 

Sbjct: 416 ERLRT 430 

. dbj|C02407|C02407 HUMGS0012279. Human Gene Signature. 3' -directed cDNA sequence. 
Length = 348 

Score = 97.5 bits (239), Expect = 6e-20 
Identities = 51/63 (80*), Positives ■= 56/63 (87*) 
Frame = +3 

Query: 125 KQQSLQQQLEQLQGLPGARERERLRADSLDSSGLSSERSDSOQEDLEVDVENLVFGTETE 184 ID17 
KQQSLQ+ QL-H3L GA ERERLRADSLDSSGLSSERSDSDQE+LEVDVE+LVFG E E ID18 
Sbjct: 45 KQQSLQRXWMQLRGLAGAAERERLRADSLDSS6LSSERSDS0QEELEVDVESLVFGGEAE 224 ID19 

Query: 185 LLQ 187 

) I . 

Sbjct: 225 LLR 233 



FIG. 17A 



25/27 



BASE COUNT 130 a 234 c 258 g 106 t 5 others 
ORIGIN 

1 cagccgcttg ctccggccgg caccctaggc cgcagtccgc caggctgtcg ccgacatgga 
61 acccttggcc agcaacatcc aggtcctgct gcaggcggcc gagttcctgg agcgccgtga 
121 gagagaggcc gagcatggtt atgcgtccct gtgcccgcat cgcagtccag gccccatcca 
,181 caggaggaag aagcgacccc cccaggctcc tggcgcgcag gacagcgggc ggtcagtgca 
241 caatgaactg gagaagcgca ggagggccca gttgaagcgg tgcctggagc ggctgaagca 
301 gcagatgccc ctgggcggcg actgtgcccg gtacaccacg ctgagcctgc tgcgccgtgc 
361 caggatgcac atccagaagc tggaggatca ggagcagcgg gcccgacagc tcaaggagag 
421 gctgcgcaca aagcagcaga gcctgcagcg gcantggatg cagctccggg ggctggcagg 
481 ngcggccgag cgggagcgnc tgcgggcgga cagtctggac tcctcaggcc tctcctctga 
541 gcgcteagac tcagaccaag aggagctgga ggtggatgtg gagagcctgg tgtttggggg 
601 tgaggccgag ctgctgcggg gcttcgtcgc cggccaggag cacagctact cgcacgtcgg 
661 cggcgcctgg ctatgatgtt cctcacccan ggcgggcctc tgccctctta ctcgttgccc 
721 aagcccactt tnc 

FIG.17B 
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